AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Neural Information Processing SystemsFeb-11-2026, 21:47:36 GMT

f337d999d9ad116a7b4f3d409fcc6480-Paper.pdf

aac, action repetition, repetition, (14 more...)

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
North America > United States > California > Santa Clara County > Cupertino (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Workflow (0.46)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Neural Information Processing SystemsFeb-7-2026, 06:52:57 GMT

TimeDiscretization-Invariant SafeActionRepetitionforPolicyGradientMethods

In reinforcement learning, continuous time is often discretized by a time scale δ, to which the resulting performance is known to be highly sensitive. In this work, we seek tofind aδ-invariantalgorithm for policygradient (PG) methods, which performs well regardless of the value ofδ. We first identify the underlying reasons that cause PG methods to fail asδ 0, proving that the variance of the PG estimator can diverge to infinity in stochastic environments under a certain assumption of stochasticity. While durative actions or action repetition can be employed to haveδ-invariance, previous action repetition methods cannot immediately react to unexpected situations in stochastic environments. We thus propose a novelδ-invariant method namedSafe Action Repetition (SAR) applicable to any existing PG algorithm. SAR can handle the stochasticity of environments byadaptivelyreacting tochanges instates during action repetition.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Country:

Asia > Middle East > Jordan (0.04)
Europe > France (0.04)
Asia > Vietnam > Long An Province (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Neural Information Processing SystemsAug-18-2025, 20:26:56 GMT

f337d999d9ad116a7b4f3d409fcc6480-Paper.pdf

machine learning, reinforcement learning, taac, (14 more...)

Country:

North America > United States > New York (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
North America > United States > California > Santa Clara County > Cupertino (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Workflow (0.46)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Neural Information Processing SystemsJan-19-2025, 13:38:00 GMT

TAAC: Temporally Abstract Actor-Critic for Continuous Control

continuous control, taac, temporally abstract actor-critic, (2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.43)

arXiv.org Artificial IntelligenceFeb-8-2024

Learning Uncertainty-Aware Temporally-Extended Actions

Lee, Joongkyu, Park, Seung Joon, Tang, Yunhao, Oh, Min-hwan

In reinforcement learning, temporal abstraction in the action space, exemplified by action repetition, is a technique to facilitate policy learning through extended actions. However, a primary limitation in previous studies of action repetition is its potential to degrade performance, particularly when sub-optimal actions are repeated. This issue often negates the advantages of action repetition. To address this, we propose a novel algorithm named Uncertainty-aware Temporal Extension (UTE). UTE employs ensemble methods to accurately measure uncertainty during action extension. This feature allows policies to strategically choose between emphasizing exploration or adopting an uncertainty-averse approach, tailored to their specific needs. We demonstrate the effectiveness of UTE through experiments in Gridworld and Atari 2600 environments. Our findings show that UTE outperforms existing action repetition algorithms, effectively mitigating their inherent limitations and significantly enhancing policy learning efficiency.

agent, algorithm, extension length, (17 more...)

2402.05439

Country:

Asia > South Korea > Seoul > Seoul (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Japan > Honshū > Chūbu > Toyama Prefecture > Toyama (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Patel, Devdhar, Sejnowski, Terrence, Siegelmann, Hava

Temporally Layered Architecture for Efficient Continuous Control

arXiv.org Artificial IntelligenceAug-8-2023

We present a temporally layered architecture (TLA) for temporally adaptive control with minimal energy expenditure. The TLA layers a fast and a slow policy together to achieve temporal abstraction that allows each layer to focus on a different time scale. Our design draws on the energy-saving mechanism of the human brain, which executes actions at different timescales depending on the environment's demands. We demonstrate that beyond energy saving, TLA provides many additional advantages, including persistent exploration, fewer required decisions, reduced jerk, and increased action repetition. We evaluate our method on a suite of continuous control tasks and demonstrate the significant advantages of TLA over existing methods when measured over multiple important metrics. We also introduce a multi-objective score to qualitatively assess continuous control policies and demonstrate a significantly better score for TLA. Our training algorithm uses minimal communication between the slow and fast layers to train both policies simultaneously, making it viable for future applications in distributed control.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

2305.18701

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > California > San Diego County > La Jolla (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Patel, Devdhar, Russell, Joshua, Walsh, Francesca, Rahman, Tauhidur, Sejnowski, Terrence, Siegelmann, Hava

Temporally Layered Architecture for Adaptive, Distributed and Continuous Control

arXiv.org Artificial IntelligenceFeb-5-2023

We present temporally layered architecture (TLA), a biologically inspired system for temporally adaptive distributed control. TLA layers a fast and a slow controller together to achieve temporal abstraction that allows each layer to focus on a different time-scale. Our design is biologically inspired and draws on the architecture of the human brain which executes actions at different timescales depending on the environment's demands. Such distributed control design is widespread across biological systems because it increases survivability and accuracy in certain and uncertain environments. We demonstrate that TLA can provide many advantages over existing approaches, including persistent exploration, adaptive control, explainable temporal behavior, compute efficiency and distributed control. We present two different algorithms for training TLA: (a) Closed-loop control, where the fast controller is trained over a pre-trained slow controller, allowing better exploration for the fast controller and closed-loop control where the fast controller decides whether to "act-or-not" at each timestep; and (b) Partially open loop control, where the slow controller is trained over a pre-trained fast controller, allowing for open loop-control where the slow controller picks a temporally extended action or defers the next n-actions to the fast controller. We evaluated our method on a suite of continuous control tasks and demonstrate the advantages of TLA over several strong baselines.

evolutionary algorithm, machine learning, reinforcement learning, (13 more...)

2301.00723

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > California > San Diego County > La Jolla (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

Sargent, Matthew J., Bentley, Peter J., Barry, Caswell, de Cothi, William

Temporally Extended Successor Representations

arXiv.org Artificial IntelligenceSep-25-2022

We present a temporally extended variation of the successor representation, which we term t-SR. t-SR captures the expected state transition dynamics of temporally extended actions by constructing successor representations over primitive action repeats. This form of temporal abstraction does not learn a top-down hierarchy of pertinent task structures, but rather a bottom-up composition of coupled actions and action repetitions. This lessens the amount of decisions required in control without learning a hierarchical policy. As such, t-SR directly considers the time horizon of temporally extended action sequences without the need for predefined or domain-specific options. We show that in environments with dynamic reward structure, t-SR is able to leverage both the flexibility of the successor representation and the abstraction afforded by temporally extended actions. Thus, in a series of sparsely rewarded gridworld environments, t-SR optimally adapts learnt policies far faster than comparable value-based, model-free reinforcement learning methods. We also show that the manner in which t-SR learns to solve these tasks requires the learnt policy to be sampled consistently less often than non-temporally extended policies.

machine learning, reinforcement learning, temporally, (12 more...)

2209.12331

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)